Talend Big Data – Spark Batch
SubscriptionThis content is available for Talend Academy subscription users.Instructor-ledThis content is available as instructor-led training.
This learning plan focuses on Spark Batch. After an introduction to Apache Spark, you work on two common Big Data use cases: sentiment analysis and download analysis.
Talend Big Data Spark Batch is designed to help you utilize the most common Talend Big Data components as well as publish Jobs to Talend Administration Center and schedule them.
Duration: 1 day (7 hours)
Target audience: Anyone who wants to use Talend Studio to interact with big data systems
Prerequisites: Completion of Talend Big Data Basics
Badge: Complete this learning plan to earn the Talend Big Data Developer Practitioner badge. To know more about the criteria to earn this badge, refer to the Talend Academy Badging Program page.
Learning objectives: After completing this learning plan, you will be able to:
-
Develop a Big Data Batch Job using the Spark framework
-
Execute Spark Jobs in YARN client and cluster mode
-
Enable Spark history server event logging
-
Copy data from a local file to HDFS
-
Copy data from MySQL to HDFS
-
Create a Hive table and copy data from HDFS to it
-
Import tweets to HDFS
-
Join, sort, and aggregate data
-
Use caches for faster processing
-
Query data from a Hive table using Hive QL
-
Query data from Spark datasets using Spark SQL
Training modules: To complete the learning plan, take the following training modules: